1,412 research outputs found
An extended Stein-type covariance identity for the Pearson family with applications to lower variance bounds
For an absolutely continuous (integer-valued) r.v. of the Pearson (Ord)
family, we show that, under natural moment conditions, a Stein-type covariance
identity of order holds (cf. [Goldstein and Reinert, J. Theoret. Probab. 18
(2005) 237--260]). This identity is closely related to the corresponding
sequence of orthogonal polynomials, obtained by a Rodrigues-type formula, and
provides convenient expressions for the Fourier coefficients of an arbitrary
function. Application of the covariance identity yields some novel expressions
for the corresponding lower variance bounds for a function of the r.v. ,
expressions that seem to be known only in particular cases (for the Normal, see
[Houdr\'{e} and Kagan, J. Theoret. Probab. 8 (1995) 23--30]; see also
[Houdr\'{e} and P\'{e}rez-Abreu, Ann. Probab. 23 (1995) 400--419] for
corresponding results related to the Wiener and Poisson processes). Some
applications are also given.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ282 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Strengthened Chernoff-type variance bounds
Let be an absolutely continuous random variable from the integrated
Pearson family and assume that has finite moments of any order. Using some
properties of the associated orthonormal polynomial system, we provide a class
of strengthened Chernoff-type variance bounds.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ484 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Optimal Piecewise Linear Regression Algorithm for QSAR Modelling
Quantitative Structure‐Activity Relationship (QSAR) models have been successfully applied to lead optimisation, virtual screening and other areas of drug discovery over the years. Recent studies, however, have focused on the development of models that are predictive but often not interpretable. In this article, we propose the application of a piecewise linear regression algorithm, OPLRAreg, to develop both predictive and interpretable QSAR models. The algorithm determines a feature to best separate the data into regions and identifies linear equations to predict the outcome variable in each region. A regularisation term is introduced to prevent overfitting problems and implicitly selects the most informative features. As OPLRAreg is based on mathematical programming, a flexible and transparent representation for optimisation problems, the algorithm also permits customised constraints to be easily added to the model. The proposed algorithm is presented as a more interpretable alternative to other commonly used machine learning algorithms and has shown comparable predictive accuracy to Random Forest, Support Vector Machine and Random Generalised Linear Model on tests with five QSAR data sets compiled from the ChEMBL database
Development of text mining tools for information retrieval from patents
Biomedical literature is composed of an ever increasing number of publications in natural language. Patents are a relevant fraction of those, being important sources of information due to all the curated data from the granting process. However, their unstructured data turns the search of information a challenging task. To surpass that, Biomedical text mining (BioTM) creates methodologies to search and structure that data. Several BioTM techniques can be applied to patents. From those, Information Retrieval is the process where relevant data is obtained from collections of documents. In this work, a patent pipeline was developed and integrated intoFEDER -Federación Española de Enfermedades Raras(NORTE-01-0145-FEDER-000004)info:eu-repo/semantics/publishedVersio
Linear Estimation of Location and Scale Parameters Using Partial Maxima
Consider an i.i.d. sample X^*_1,X^*_2,...,X^*_n from a location-scale family,
and assume that the only available observations consist of the partial maxima
(or minima)sequence, X^*_{1:1},X^*_{2:2},...,X^*_{n:n}, where
X^*_{j:j}=max{X^*_1,...,X^*_j}. This kind of truncation appears in several
circumstances, including best performances in athletics events. In the case of
partial maxima, the form of the BLUEs (best linear unbiased estimators) is
quite similar to the form of the well-known Lloyd's (1952, Least-squares
estimation of location and scale parameters using order statistics, Biometrika,
vol. 39, pp. 88-95) BLUEs, based on (the sufficient sample of) order
statistics, but, in contrast to the classical case, their consistency is no
longer obvious. The present paper is mainly concerned with the scale parameter,
showing that the variance of the partial maxima BLUE is at most of order
O(1/log n), for a wide class of distributions.Comment: This article is devoted to the memory of my six-years-old, little
daughter, Dionyssia, who leaved us on August 25, 2010, at Cephalonia isl. (26
pages, to appear in Metrika
Subclinical VZV reactivation in immunocompetent children hospitalized in the ICU associated with prolonged fever duration*
AbstractA prospective observational study was conducted to examine whether asymptomatic VZV reactivation occurs in immunocompetent children hospitalized in an ICU and its impact on clinical outcome. A secondary aim was to test the hypothesis that vaccinated children have a lower risk of reactivation than naturally infected children. Forty immunocompetent paediatric ICU patients and healthy controls were enrolled. Patients were prospectively followed for 28 days. Clinical data were collected and varicella exposure was recorded. Admission serum levels of TNF-a, cortisol and VZV-IgG were measured. Blood and saliva samples were collected for VZV-DNA detection via real-time PCR. As a comparison, the detection of HSV-DNA was also examined. Healthy children matched for age and varicella exposure type (infection or vaccination) were also included. VZV reactivation was observed in 17% (7/39) of children. Children with VZV reactivation had extended duration of fever (OR = 1.17; 95% CI, 1.02–1.34). None of the varicella-vaccinated children or healthy controls had detectable VZV-DNA in any blood or saliva samples examined. HSV-DNA was detected in saliva from 33% of ICU children and 2.6% of healthy controls. Among children with viral reactivation, typing revealed wild-type VZV and HSV-1. In conclusion, VZV reactivation occurs in immunocompetent children under severe stress and is associated with prolonged duration of fever
Target identification of hits using a concerted chemogenomic, biophysical and structural approach
Mycobacterium phenotypic hits are a good reservoir for new chemotypes for the treatment of tuberculosis. However, the absence
of defined molecular targets and modes of action could lead to failure in drug development. Therefore, a combination of
ligand-based and structure-based chemogenomic approaches followed by biophysical and biochemical validation have been used to
identify targets for Mycobacterium tuberculosis phenotypic hits. Our approach identified EthR and InhA as targets for several hits,
with some showing dual activity against these proteins. From the 35 predicted EthR inhibitors, eight exhibited an IC50 below 50
μM against M. tuberculosis EthR and three were confirmed to be also simultaneously active against InhA. Further hit validation was
performed using X-ray crystallography yielding eight new crystal structures of EthR inhibitors. Although the EthR inhibitors attain
their activity against M. tuberculosis by hitting yet undefined targets, these results provide new lead compounds that could be
further developed to be used to potentiate the effect of EthA activated pro-drugs, such as ethionamide, thus enhancing their
bactericidal effect.GM is grateful to the European Molecular Biology Laboratory and Marie Sklodowska-Curie Actions for funding this work. VM and MB
acknowledge Bill & Melinda Gates Foundation [subcontract by the Foundation for the National Institutes of Health (NIH)]
(OPP1024021). VM and MS acknowledge the European Community’s Seventh Framework Programme [grant number 260872]. GP
would like to acknowledge the Wellcome Trust and the European Molecular Biology Laboratory for funding. JPO was funded by the
member nation states of the European Molecular Biology Laboratory. TLB acknowledges The Wellcome Trust for funding and
support (grant number 200814/Z/16/Z)
A document classifier for medicinal chemistry publications trained on the ChEMBL corpus
Background
The large increase in the number of scientific publications has fuelled a need for semi- and fully automated text mining approaches in order to assist in the triage process, both for individual scientists and also for larger-scale data extraction and curation into public databases. Here, we introduce a document classifier, which is able to successfully distinguish between publications that are ‘ChEMBL-like’ (i.e. related to small molecule drug discovery and likely to contain quantitative bioactivity data) and those that are not. The unprecedented size of the medicinal chemistry literature collection, coupled with the advantage of manual curation and mapping to chemistry and biology make the ChEMBL corpus a unique resource for text mining.
Results
The method has been implemented as a data protocol/workflow for both Pipeline Pilot (version 8.5) and KNIME (version 2.9) respectively. Both workflows and models are freely available at: ftp://ftp.ebi.ac.uk/pub/databases/chembl/text-mining webcite. These can be readily modified to include additional keyword constraints to further focus searches.
Conclusions
Large-scale machine learning document classification was shown to be very robust and flexible for this particular application, as illustrated in four distinct text-mining-based use cases. The models are readily available on two data workflow platforms, which we believe will allow the majority of the scientific community to apply them to their own data.FWN – Publicaties zonder aanstelling Universiteit Leide
Evaluation of machine-learning methods for ligand-based virtual screening
Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed
- …